SynthAssess Report

Original Data Sample
age workclass fnlwgt education education-num marital-status occupation relationship race sex capital-gain capital-loss hours-per-week native-country income
18 4 274057 1 7 4 7 3 2 1 0 0 8 38 0
24 2 161092 11 9 4 7 3 4 1 0 0 40 38 0
50 1 176969 11 9 0 9 1 4 1 0 1590 40 38 0
39 4 147548 11 9 2 2 0 4 1 0 0 40 38 0
23 4 27776 8 11 4 6 1 4 1 0 0 40 38 0
Synthetic Data Sample
age workclass fnlwgt education education-num marital-status occupation relationship race sex capital-gain capital-loss hours-per-week native-country income
47.0 4.0 210495.9 15.0 9.0 0.0 7.0 4.0 4.0 0.0 65.9 0.4 37.2 38.0 0.0
46.9 4.0 267970.2 15.0 9.9 2.0 6.0 0.0 4.0 1.0 163.5 0.4 52.5 38.0 0.0
19.9 4.0 199113.5 15.0 10.0 4.0 11.0 3.0 4.0 0.0 0.0 2.5 16.2 38.0 0.0
21.4 4.0 143013.0 15.0 9.0 4.0 0.0 3.0 4.0 1.0 39.5 0.0 50.1 38.0 0.0
22.2 4.0 240603.6 15.0 9.2 4.0 5.0 1.0 4.0 1.0 147.2 0.0 42.2 38.0 0.0
Range Coverage
Column Range Coverage (%)
age 100.0
fnlwgt 100.0
education 100.0
education-num 100.0
marital-status 100.0
occupation 100.0
relationship 100.0
race 100.0
sex 100.0
capital-gain 100.0
capital-loss 100.0
hours-per-week 100.0
income 100.0
native-country 97.5
workclass 87.5
Mean Range Coverage 99.0
Original Data Description
index age workclass fnlwgt education education-num marital-status occupation relationship race sex capital-gain capital-loss hours-per-week native-country income
count 4000.000000 4000.000000 4000.000000 4000.000000 4000.000000 4000.00000 4000.000000 4000.000000 4000.00000 4000.000000 4000.000000 4000.000000 4000.00000 4000.000000 4000.000000
mean 38.648750 3.857750 189037.437000 10.345750 10.069250 2.58200 5.724500 1.431500 3.65925 0.674250 1110.602750 84.769250 40.28725 35.895500 0.239750
std 13.599434 1.476672 101210.507509 3.816064 2.566535 1.49276 3.985232 1.604354 0.86477 0.468713 7774.422146 397.072537 12.33934 7.391701 0.426984
min 17.000000 0.000000 13769.000000 0.000000 1.000000 0.00000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 1.00000 0.000000 0.000000
25% 28.000000 4.000000 119429.500000 9.000000 9.000000 2.00000 2.000000 0.000000 4.00000 0.000000 0.000000 0.000000 40.00000 38.000000 0.000000
50% 37.000000 4.000000 178513.500000 11.000000 10.000000 2.00000 6.000000 1.000000 4.00000 1.000000 0.000000 0.000000 40.00000 38.000000 0.000000
75% 47.000000 4.000000 239548.250000 12.000000 12.000000 4.00000 9.000000 3.000000 4.00000 1.000000 0.000000 0.000000 45.00000 38.000000 0.000000
max 90.000000 8.000000 816750.000000 15.000000 16.000000 6.00000 13.000000 5.000000 4.00000 1.000000 99999.000000 3004.000000 99.00000 40.000000 1.000000
Synthetic Data Description
index age workclass fnlwgt education education-num marital-status occupation relationship race sex capital-gain capital-loss hours-per-week native-country income
count 4000.000000 4000.000000 4000.000000 4000.000000 4000.000000 4000.000000 4000.000000 4000.000000 4000.000000 4000.000000 4000.000000 4000.000000 4000.000000 4000.000000 4000.000000
mean 38.341450 3.640500 186132.738250 14.715250 9.933025 2.373750 4.495750 1.292750 3.676000 0.697000 873.662600 72.148300 40.511200 35.711000 0.229750
std 11.483542 1.552856 76989.747024 1.698202 2.323304 1.521887 4.222374 1.569606 0.868742 0.459613 5952.854036 346.021566 8.907642 8.555979 0.420725
min 17.000000 0.000000 13769.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000
25% 29.300000 4.000000 135897.850000 15.000000 9.000000 2.000000 1.000000 0.000000 4.000000 0.000000 0.000000 0.000000 37.900000 38.000000 0.000000
50% 37.600000 4.000000 177197.200000 15.000000 10.000000 2.000000 3.000000 1.000000 4.000000 1.000000 84.500000 0.000000 40.500000 38.000000 0.000000
75% 46.000000 4.000000 225299.725000 15.000000 11.800000 4.000000 9.000000 3.000000 4.000000 1.000000 304.450000 0.600000 43.725000 38.000000 0.000000
max 90.000000 7.000000 816750.000000 15.000000 16.000000 6.000000 13.000000 5.000000 4.000000 1.000000 99999.000000 3004.000000 99.000000 39.000000 1.000000
Comparison of Descriptive Statistics
Bivariate Correlation Matrix
Scatter Plot Comparison

Average k-NN Distance for Original Samples: {'count': 40000.0, 'mean': 1069.781929112338, 'std': 4487.1212468392805, 'min': 16.42677366416198, '25%': 134.87354202066683, '50%': 245.2749000186657, '75%': 524.7033876568216, 'max': 170121.07954765612}

Average k-NN Distance for Synthetic Samples: {'count': 40000.0, 'mean': 975.3411489253886, 'std': 5186.126488307377, 'min': 29.314062832949947, '25%': 180.67553562090984, '50%': 295.0973547213475, '75%': 620.0218918643225, 'max': 278613.08140938735}

Average Neighbours for Original Samples: {'count': 40000.0, 'mean': 0.5, 'std': 0.5000062501171899, 'min': 0.0, '25%': 0.0, '50%': 0.5, '75%': 1.0, 'max': 1.0}

k-NN Distance Benchmark
NNeighbours for Original Sample

the main privacy attack, in which the attacker uses the synthetic data to guess information on records in the original data.

the baseline attack, which models a naive attacker who ignores the synthetic data and guess randomly.

the control privacy attack, in which the attacker uses the synthetic data to guess information on records in the control dataset.

Singling Out Results

Overall Singling Out PrivacyRisk(value=0.12259447414492806, ci=(1.8356336432573395e-05, 0.24517059195342356))

Main: SuccessRate(value=0.18425818370534194, error=0.10088397692500792)

Baseline: SuccessRate(value=0.035673799566679355, error=0.03567379956667936)

Control: SuccessRate(value=0.07027960018865821, error=0.06041235889372074)

Linkability Results

Overall Linkage PrivacyRisk(value=0.05778039010758086, ci=(0.0, 0.13903281943701717))

Main: SuccessRate(value=0.09139294361867784, error=0.07077797326970388)

Baseline: SuccessRate(value=0.05424684758401218, error=0.05070758831236596)

Control: SuccessRate(value=0.035673799566679355, error=0.03567379956667936)

Inference Results

Inference Attack
Original Data Classification Report
index precision recall f1-score support
0 0.88 0.94 0.91 757.00
1 0.77 0.58 0.66 243.00
accuracy 0.86 0.86 0.86 0.86
macro avg 0.82 0.76 0.78 1000.00
weighted avg 0.85 0.86 0.85 1000.00
Synthetic Data Classification Report
index precision recall f1-score support
0 0.87 0.93 0.90 757.00
1 0.71 0.56 0.63 243.00
accuracy 0.84 0.84 0.84 0.84
macro avg 0.79 0.74 0.76 1000.00
weighted avg 0.83 0.84 0.83 1000.00
ROC Curve
Data Discriminator Original X Synthetic
index precision recall f1-score support
0 0.99 0.99 0.99 804.00
1 0.99 0.99 0.99 796.00
accuracy 0.99 0.99 0.99 0.99
macro avg 0.99 0.99 0.99 1600.00
weighted avg 0.99 0.99 0.99 1600.00
Feature Importance Original X Synthetic
Data Discriminator Original X Holdout
index precision recall f1-score support
0 0.69 0.61 0.65 199.00
1 0.66 0.73 0.69 201.00
accuracy 0.67 0.67 0.67 0.67
macro avg 0.67 0.67 0.67 400.00
weighted avg 0.67 0.67 0.67 400.00
Feature Importance Original X Holdout
Data Discriminator Synthetic X Holdout
index precision recall f1-score support
0 0.99 0.99 0.99 199.0
1 1.00 1.00 1.00 201.0
accuracy 1.00 1.00 1.00 1.0
macro avg 0.99 0.99 0.99 400.0
weighted avg 1.00 1.00 1.00 400.0
Feature Importance Synthetic X Holdout